Summary Statistics:
open \
mean median min max std
stock_symbol
AAPL 51.274171 29.745001 6.870357 182.630005 47.316509
ADBE 186.023938 97.589996 22.969999 696.280029 173.562115
AMZN 58.937204 36.325001 5.296500 187.199997 54.138607
CRM 103.419948 76.290001 15.522500 310.000000 71.287894
CSCO 33.493880 29.500000 13.930000 64.040001 12.643887
GOOGL 49.149625 38.521000 10.968719 151.250000 35.809816
IBM 148.427471 143.173996 90.439774 205.908218 23.998357
INTC 36.481406 34.270000 17.879999 68.199997 12.912534
META 147.874275 141.620002 18.080000 381.679993 86.734656
MSFT 100.048490 55.660000 23.090000 344.619995 88.228008
NFLX 188.242206 110.010002 6.960000 692.349976 178.967463
NVDA 50.560335 11.902500 2.180000 335.170013 69.538684
ORCL 46.242177 41.750000 21.459999 104.290001 16.864193
TSLA 58.859467 16.229000 1.076000 411.470001 95.677282
volume
mean median min max std
stock_symbol
AAPL 2.563255e+08 166674000.0 35195900 1880998000 2.225768e+08
ADBE 3.814337e+06 2948500.0 589200 108752400 3.598144e+06
AMZN 8.833999e+07 74592000.0 17626000 848422000 5.309249e+07
CRM 6.910973e+06 5548800.0 1084700 64562800 5.048860e+06
CSCO 3.269656e+07 25482400.0 5720500 560040200 2.570963e+07
GOOGL 6.018647e+07 41234000.0 9312000 592399008 4.957963e+07
IBM 5.036545e+06 4345189.0 1247878 39814421 2.772073e+06
INTC 3.607170e+07 29874600.0 5893800 199002600 2.123178e+07
META 3.117815e+07 23239000.0 5913100 573576400 2.713267e+07
MSFT 3.801647e+07 32280800.0 7425600 319317900 2.147328e+07
NFLX 1.841485e+07 11961800.0 1144000 315541800 2.054316e+07
NVDA 5.080613e+07 43395600.0 4564400 369292800 3.210953e+07
ORCL 1.801856e+07 14699800.0 2754900 183503900 1.251053e+07
TSLA 9.351647e+07 75914250.0 1777500 914082000 8.164780e+07
Big Tech Stock Prices
An analysis of 14 Big Tech stocks from 2010 - 2020
Abstract
The goal of this project is to understand the temporal behavior of Big Tech stocks that are actively traded on the NYSE. This analysis will show trends in the market to understand times when the market was on a downtrend (bear market) or in an uptrend (bull market). This will provide insight into times when investments should be made or when they should be pulled back. Investment analyses such as the aformentioned are extremely valuable to large finanacial institutions that aim to benefit from growth in the market. This is extremely important for those who place their hard earned dollar with these institutions as a means to save for retirement.
The analysis will consist of answering two key questions in a sequential manner: exploratory data analysis (EDA), data wrangling, and data visualization. This process is key to gaining insight from the data and allows for a clean platform that can used to train machine learning models.
Introduction to the Dataset
The dataset used in this analysis was sourced from Tidy Tuesday; an open source conglomeration of data for visualization and machine learning. The dataset contains information on 14 different Big Tech companies’s stock price and trading volume from 2010 to 2022 on the New York Stock Exchange. There are two .csv (comma seperated variable) files used, one called “big_tech_stock_prices.csv” and another called “big_tech_companies.csv”. For simplicity, the files are renamed “stocks” and “companies”, respectively. Stocks contains a majority of the information used in this analysis, including the daily low, high, open, and close prices, as well as trading volume. Low price represents the lowest price for the day while high price represents the opposite. Open price represents the price when the market opens at 9:30am ET, and close price represents the price when the market closes at 4pm ET. Trading volume represents the amount of stocks bought and sold that day; this metric is important for understanding the popularity of the stock.
The interest in this dataset was sparked by its potential to provide valuable insights into the stock market behavior of big tech companies. This enables a comprehensive examination of stock market trends, facilitates comparisons across various companies, and aids in assessing the influence of significant external events on stock valuations.
Q1: How do stock prices change over time based on basic economic metrics
Introduction
To answer the question of how stock prices change over time, the daily price movement and volume will be visualized. Looking at price changes temporally will provide insights into past trends in the stock market. The stock market is cyclical, and thus finding patterns in the past data will allow for insights into future movements of the stock market. This is benefical because it can lead to knowledge of when a stock price will rise. The interest in this question is based on team interest in investing and making low-risk financial decisions.
Approach
To begin the analysis of question one, the data is first summarized for basic statistics such as mean, median, standard deviation for both price variation and trading volume. This allows for a basic understanding of the distribution of the data and how metrics such as outliers influence the spread of the data. Once a basic understanding of the data is formed, the data is then visualized.
One step that is typically used for raw, uncleaned data is filtering. This allows for the data to be manipulated to ensure a normal distribution (or other distribution required by the ML model). In the case of this analysis, filtering and cleaning is not needed because the data is already in a form that can be visualized easily.
Finally, the data is visualized in three plots, a candlestick plot, a line plot, and a bar plot. The candlestick plot is a line plot where each data point is a bar that shows the open, close, high, and low prices. The size of the bar represents the volume of stocks traded in that day. The bars are then connected by a line and when looking at the entire time frame, trends can be seen. The line plot will be used to show the closing prices over time. The bar plot will show the overall volume of trading of each stock in the entire time frame. These three plots will be plotly objects, and thus can be manipuated and zoomed in on to see trends both globally and regionally over the time period. The candle stick will provide a convient display of all the data, while the line and bar plots break down the candle stick plot to scrutinize the data on a magnified scale.
Analysis
Discussion
The findings from data preprocessing and visualization reveals multiple insights. First, beyond cyclical fluxations, the price of every stock has increased since the initial 2010 open. Additionally, not all companies existed in 2010 and thus it can be seen that some stocks don’t appear until after 2010. Another valuable insight found was the large increase in growth of a majority of the companies in 2018. If this trend is to continue, it is beneficial to invest in the market now, though it would be wise to view other companies that have experienced exponential growth and review the times before prices settled to a lower value, the dot com boom of the early 2000s would be a good case study.
Looking at the bar plot of all companies, it is clear that Apple was the most popular stock in the dataset, this is clear from its 150% larger volume than the next most traded stock. Finally, the crash caused by the COVID-19 pandemic is clearly visible across all the companies, when in the March of 2020, stock prices plummetted.
Q2 Backwards verification: if we invested x amount of dollars in 2010, how much would it be worth in 2022, when would be a good/bad time to pull investment out of the market.
Introduction
A fundamental question for any investor is understanding how an initial investment grows over time. This analysis aims to determine the value of an investment made in 2010 in various technology stocks and its worth in 2022. To answer this, we’ll need the historical closing prices of each stock, which will help us to calculate the investment’s growth over the specified period. The second part of the analysis aims to determine good and bad times to pull an investment out of the market.
First, we’re loading up all the stock and company information from two CSV files. We then make copies of this data to work with, ensuring we don’t mess with the original files. The goal here is to gather basic statistics like the average, median, highest, lowest, and standard deviation for various aspects of the stock prices, such as the opening and closing prices. This provides us with a summary that gives a broad overview of each stock’s performance over time.
Summary Statistics:
open \
mean median min max std
stock_symbol
AAPL 51.274171 29.745001 6.870357 182.630005 47.316509
ADBE 186.023938 97.589996 22.969999 696.280029 173.562115
AMZN 58.937204 36.325001 5.296500 187.199997 54.138607
CRM 103.419948 76.290001 15.522500 310.000000 71.287894
CSCO 33.493880 29.500000 13.930000 64.040001 12.643887
GOOGL 49.149625 38.521000 10.968719 151.250000 35.809816
IBM 148.427471 143.173996 90.439774 205.908218 23.998357
INTC 36.481406 34.270000 17.879999 68.199997 12.912534
META 147.874275 141.620002 18.080000 381.679993 86.734656
MSFT 100.048490 55.660000 23.090000 344.619995 88.228008
NFLX 188.242206 110.010002 6.960000 692.349976 178.967463
NVDA 50.560335 11.902500 2.180000 335.170013 69.538684
ORCL 46.242177 41.750000 21.459999 104.290001 16.864193
TSLA 58.859467 16.229000 1.076000 411.470001 95.677282
high ... \
mean median min max std ...
stock_symbol ...
AAPL 51.845876 29.980000 7.000000 182.940002 47.926721 ...
ADBE 188.208985 98.239998 23.360001 699.539978 175.674664 ...
AMZN 59.610805 36.500000 5.564500 188.654007 54.785319 ...
CRM 104.720219 77.160004 15.625000 311.750000 72.149412 ...
CSCO 33.804014 29.770000 14.120000 64.290001 12.760317 ...
GOOGL 49.638485 38.930000 11.068068 151.546494 36.215672 ...
IBM 149.571376 144.160004 93.441681 206.405350 24.013109 ...
INTC 36.875197 34.580002 17.920000 69.290001 13.090012 ...
META 149.757980 143.415000 18.270000 384.329987 87.818979 ...
MSFT 101.039621 56.000000 23.320000 349.670013 89.181105 ...
NFLX 191.178020 111.900002 7.178571 700.989990 181.488846 ...
NVDA 51.504697 11.982500 2.262500 346.470001 70.979752 ...
ORCL 46.696918 42.000000 21.680000 106.339996 17.059092 ...
TSLA 60.174863 16.491000 1.108667 414.496674 97.873400 ...
adj_close \
mean median min max std
stock_symbol
AAPL 49.445122 27.385101 5.846675 180.959732 47.810585
ADBE 186.022299 97.720001 22.690001 688.369995 173.466083
AMZN 58.905287 36.382500 5.430500 186.570496 54.085034
CRM 103.400510 76.260002 15.520000 309.959991 71.213839
CSCO 28.624463 23.570276 9.743538 61.521923 13.590111
GOOGL 49.148954 38.538502 10.912663 149.838501 35.803048
IBM 113.148579 113.695961 75.138626 150.570007 14.336266
INTC 31.310983 28.492294 12.135988 64.383247 13.745841
META 147.913244 142.065002 17.730000 382.179993 86.763257
MSFT 95.285446 50.052330 17.769510 339.924835 89.253596
NFLX 188.252178 110.099998 7.018571 691.690002 178.877130
NVDA 50.282939 11.685297 2.037410 333.407379 69.495321
ORCL 42.572566 37.007454 17.991089 101.501656 17.867223
TSLA 58.805222 16.222334 1.053333 409.970001 95.544413
volume
mean median min max std
stock_symbol
AAPL 2.563255e+08 166674000.0 35195900 1880998000 2.225768e+08
ADBE 3.814337e+06 2948500.0 589200 108752400 3.598144e+06
AMZN 8.833999e+07 74592000.0 17626000 848422000 5.309249e+07
CRM 6.910973e+06 5548800.0 1084700 64562800 5.048860e+06
CSCO 3.269656e+07 25482400.0 5720500 560040200 2.570963e+07
GOOGL 6.018647e+07 41234000.0 9312000 592399008 4.957963e+07
IBM 5.036545e+06 4345189.0 1247878 39814421 2.772073e+06
INTC 3.607170e+07 29874600.0 5893800 199002600 2.123178e+07
META 3.117815e+07 23239000.0 5913100 573576400 2.713267e+07
MSFT 3.801647e+07 32280800.0 7425600 319317900 2.147328e+07
NFLX 1.841485e+07 11961800.0 1144000 315541800 2.054316e+07
NVDA 5.080613e+07 43395600.0 4564400 369292800 3.210953e+07
ORCL 1.801856e+07 14699800.0 2754900 183503900 1.251053e+07
TSLA 9.351647e+07 75914250.0 1777500 914082000 8.164780e+07
[14 rows x 30 columns]
Next, we’re double-checking to make sure our data doesn’t have any gaps (missing values). Then, we’re doing some detective work to spot any weird or extreme data points—what we call ‘outliers’—that might throw off our analysis. We use the method Z-score to find these outliers, which helps us measure how unusual a data point is. If it’s too far from what’s normal (more than 3 standard deviations), we remove it to keep our data clean.
Null Values from Stocks dataset stock_symbol 0
date 0
open 0
high 0
low 0
close 0
adj_close 0
volume 0
dtype: int64
Null Values from companies dataset stock_symbol 0
company 0
dtype: int64
Approach
To understand how an investment has grown, we’ll look at the stock prices at the start of 2010 and compare them to the end of 2022. We’ll adjust these prices so that they all start at the same point, which makes it fair to compare different stocks. This way, we’re not distracted by some stocks being naturally more expensive than others; we’re focusing purely on how much they have grown proportionally.
We’re going to use line charts to show this growth over time. Each company’s growth curve will have its own color, making it easy to see which stocks are stars and which are not. These charts help us spot the times when stocks were soaring and when they were not doing so well.
To understand when to buy or sell a stock, we’ll use a common technique called the moving average crossover. Think of this as tracking two different running averages of a stock’s price: one that looks at the last 50 days (short-term view) and another that looks at the last 200 days (long-term view). When the short-term line crosses above the long-term line, it’s like a green light that the stock’s price might be heading up—a hint that it might be a good time to buy. When it crosses below, it’s a red flag that prices could be going down, suggesting it might be time to sell.
We’ll map these two averages onto a chart, which will help us spot exactly where these crossovers happen. It’s a straightforward method but well-regarded in the finance world for spotting when a stock’s trend might be changing direction.
Our analysis will calculate these two averages for each stock and lay them over a chart of the stock’s actual prices. We’re looking for where these average lines cross over each other—these are the critical moments that might signal to investors to act. To showcase this, we’ll use a type of chart called a candlestick chart, which is great for showing not just the average trends, but also giving us a detailed snapshot of stock price movements over time.
Analysis
Let’s assume we put $1,000 into each stock at the beginning of 2010. We’ll use our adjusted prices to track how much that $1,000 would have turned into by 2022. By looking at the charts for each stock, we can see the final tally for our investment as of 2022, giving us a clear picture of where our hypothetical $1,000 would have taken us over 12 years.
In this part, We take the list of stock symbols and create a line graph for each stock that shows how its closing price has changed day by day. The result is a colorful chart where each line represents a company’s stock, making it easy to see how stock prices have moved over time.
In this part, we’re starting all the stocks from the same line (‘normalization’), so we can compare them fairly. Imagine every stock begins at $10 in 2010, and we track how this value changes. This gives us a clear picture of each stock’s performance relative to the others, regardless of their actual price differences. We draw this out on a chart, so it’s easy to follow their growth over the years.
In this part, we’re focusing on Apple’s stock (AAPL) as an example, and we’re using two smoothed-out lines to help us decide when might be a good time to buy or sell. These lines represent the average closing price over the last 50 and 200 days. We plot these on a special kind of graph called a ‘candlestick chart’, which not only shows the average trends but also the daily price movements in more detail. This chart helps us spot those key moments where the short-term average crosses over the long-term average, which can signal whether it’s potentially a good time to get in or out of the market.